Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: mccown Date: Jun 21, 2005 10:50pm
Forum: web Subject: Re: Advanced search yields current page

Igor- thanks for your reply.

From what I understand, if a user is using your advanced search and entering "http://www.foo.edu/mypage.html", they want to know what is the latest copy of mypage.html that you have indexed.

If you always return a fresh page off the live web, what purpose does that serve the user? That's like me asking to view Google's cached version of a page and them showing me the actual page.

I would think the page that should be returned to the user would be the most up-to-date page from the query:

http://web.archive.org/web/*/http://www.foo.edu/mypage.html

I look forward to you turning on the incremental index. I assume that will then show the pages you have most recently crawled using the above URL rather than pages from the past 6-12 months and beyond.

Frank

Reply to this post
Reply [edit]

Poster: glenn Date: Jul 11, 2005 4:48pm
Forum: web Subject: Re: Advanced search yields current page

The 6-12 month delay is from the delay in receiving the updates from Alexa:

http://www.archive.org/about/faqs.php#103

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: Jun 30, 2005 6:41am
Forum: web Subject: Re: Advanced search yields current page

>From what I understand, if a user is using your >advanced search and entering >"http://www.foo.edu/mypage.html", they want to >know what is the latest copy of mypage.html that >you have indexed.
>If you always return a fresh page off the live >web, what purpose does that serve the user? >That's like me asking to view Google's cached >version of a page and them showing me the actual >page.

I am not sure if I was clear about this issue.
If we have http://foo.edu/mypage.html in the index than WM will show you the latest version that we have.
For example http://web.archive.org/apples.com.

If we don't have http://foo.edu/mypage.html in the index, than WM fetches the page from the live Web, archives the page, and displays it to the user.
Once this page is added to the index, WM will stop fetching it from the live Web.
Idea behind this is to archive pages that are not in the archive and are requested by users.
Unfortunately, WM does not produce any feedback in this case. This should be changed in the next WM revision so that users are aware of the process.

Take care,
i.

Reply to this post
Reply [edit]

Poster: mccown Date: Jul 2, 2005 7:59am
Forum: web Subject: Re: Advanced search yields current page

Igor,

Yes, I understand your description of how the process works. It's like if I was viewing a search result in Google and I clicked on the link to see a cached version of the page. And if Google really hadn't cached the page, they would grab the live page and display that to the user saying they crawled the page today. Technically they would be correct, but had the page not been available from the live web, they would have had to say "sorry, but we didn’t actually cache that page." If their cache worked that way it wouldn’t really serve any purpose.

That, in effect, is what happens when I ask to see an archived version of a page that is not in the archive. I ask to see what is in the archive, but since the page is not in the archive you grab it and say "here it is!" Because I then think it's in your archive I may delete it from my web server. If I then ask for it again, you try to fetch it again from the live Web and say "sorry, but we don't have it!"

So all I’m saying is that if you have to fetch the page live from the Web, give the "Not in Archive" page so they’ll realize it’s not really in the archive yet.

Frank

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: Jul 7, 2005 5:38am
Forum: web Subject: Re: Advanced search yields current page

Sounds good to me.

Important thing is a feedback so that is clear what is happening.

Terms of Use (10 Mar 2001)